Picture for Shikun Zhang

Shikun Zhang

What Do Agents Learn from Trajectory-SFT: Semantics or Interfaces?

Add code
Feb 02, 2026
Viaarxiv icon

ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

Add code
Jan 15, 2026
Viaarxiv icon

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

Add code
Nov 13, 2025
Viaarxiv icon

Autoformalizer with Tool Feedback

Add code
Oct 08, 2025
Figure 1 for Autoformalizer with Tool Feedback
Figure 2 for Autoformalizer with Tool Feedback
Figure 3 for Autoformalizer with Tool Feedback
Figure 4 for Autoformalizer with Tool Feedback
Viaarxiv icon

SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling

Add code
Aug 11, 2025
Figure 1 for SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling
Figure 2 for SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling
Figure 3 for SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling
Figure 4 for SAEMark: Multi-bit LLM Watermarking with Inference-Time Scaling
Viaarxiv icon

Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Add code
May 23, 2025
Figure 1 for Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
Figure 2 for Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
Figure 3 for Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
Figure 4 for Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective
Viaarxiv icon

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Add code
May 22, 2025
Viaarxiv icon

VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Add code
May 22, 2025
Viaarxiv icon

Mitigating Spurious Correlations with Causal Logit Perturbation

Add code
May 21, 2025
Viaarxiv icon

Can You Really Trust Code Copilots? Evaluating Large Language Models from a Code Security Perspective

Add code
May 15, 2025
Viaarxiv icon